06 - Project 1
🧬 Project: Neural Classification of Erythrocyte Anomalies
1. Project Overview
In low-resource hematology settings, manual screening of blood smears for intracellular parasites is time-consuming and error-prone. This project aims to automate the triage process by developing a Deep Learning model capable of distinguishing between healthy erythrocytes (red blood cells) and those containing a specific intracellular pathogen.
Your task is to design, train, and validate a Convolutional Neural Network (CNN) to perform binary classification on single-cell images.
2. The Dataset
Dataset download link: Dataset
You are provided with a proprietary dataset consisting of segmented “patches” (Regions of Interest) extracted from thin blood smear slides stained with Giemsa. Each image contains a single cell.
The data has been anonymized and split into two distinct sets:
train: A folder containing approximately 22,000 labeled images.- Sub-folder
negative(class0): Represents Healthy/Control samples. - Sub-folder
positive(class1): Represents Infected/Anomalous samples.
- Sub-folder
test: A folder containing approximately 5,500 unlabeled images.- You do not have the ground truth labels for this set.
- You will use this set to generate your final predictions for grading.
⚠️ Data Note: The images possess varying resolutions and aspect ratios. A crucial part of your pipeline will be establishing a robust pre-processing strategy to normalize these inputs before feeding them into your network.
3. Technical Objectives
A. Data Pre-processing & Augmentation
Since the input dimensions vary, you must implement a pipeline to:
- Resize/Rescale images to a fixed input size (e.g., \(64\times64\), \(128\times128\), or \(224\times224\)) suitable for your architecture.
- Normalize pixel intensity values.
- Implement Data Augmentation on the training set to prevent overfitting. Consider rotations, flips, and brightness adjustments to simulate varying lighting conditions in microscopy.
B. Neural Network Architecture
You are required to construct a Convolutional Neural Network. You may choose one of two paths:
- Custom Architecture: Design your own stack of Convolutional, Max-Pooling, and Dense layers. You must justify your choice of kernel sizes and depth.
- Transfer Learning: Utilize a pre-trained backbone (e.g., VGG-16, ResNet-18, MobileNet) with a custom classification head. If you choose this, you must explain your freezing/unfreezing strategy.
C. Training Loop
- Loss Function: Select a loss function appropriate for binary classification..
- Optimizer: Use an adaptive optimizer or SGD with momentum.
- Validation: You must split the provided
trainfurther to create your own internal validation set (e.g., an 80/20 split) to monitor loss curves and stop training before overfitting occurs.
4. Deliverables
Part 1: Short report
Your short report should be a PDF document containing the following information:
- Neural network architecture, loss function, optimizer, and hyperparameters used.
- Captum or similar library visualizations (e.g., Grad-CAM) to interpret model decisions on sample images.
Part 2: The “Blind” Test Submission
You must run your final, trained model on the images in the
test folder.
- Generate a CSV file named
submission.csv. - The file must contain header and have two columns:
filenameandprediction(0 for negative or 1 for positive). - Ensure the filenames match exactly.